TD Control: Q-Learning

Please watch the video below to learn about Q-Learning (or Sarsamax), a second method for TD control.

Check out this (optional) research paper to read the proof that Q-Learning (or Sarsamax) converges.

06. TD Control: Q-Learning